An unsupervised approach to creating web audio contents-based HMM voices

نویسندگان

Jinfu Ni

Hisashi Kawai

چکیده

This paper presents an approach toward rapid creation of varied synthetic voices at low cost. This consists of amassing audio web contents, extracting usable speech from them, further transcribing the speech to surface text and performing phone-time alignment, and using the speech and transcripts to build HMMbased voices. A set of experiments is conducted to evaluate this approach. The results indicate that: large volumes of audio content are available on the internet, in which more than 33.3% of web radio data are unusable for building voices due to noise, music, and the speaker’s overlapping. Among the 14 voices built from limited radio monologues in Japanese, there are three fair (middle of the five-point scale) voices but two voices are bad (the lowest level). The influence of erroneous transcripts on voice quality is significant. In order to achieve fair voice quality with limited speech data, the phone and word accuracy of speech transcriptions must be higher than 80% and 50%, respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An investigation of the impact of speech transcript errors on HMM voices

Toward automatic creation of web-based voice fonts at low cost, automatic speech transcription technology is used to obtain the linguistic features for building HMM-based voices from audio web contents. This paper presents an investigation of the influences of erroneous transcripts on such voices. We simulate varied speech transcript errors by using a large vocabulary automatic speech recognize...

متن کامل

Some Aspects of ASR Transcription Based Unsupervised Speaker Adaptation for HMM Speech Synthesis

Statistical parametric synthesis offers numerous techniques to create new voices. Speaker adaptation is one of the most exciting ones. However, it still requires high quality audio data with low signal to noise ration and precise labeling. This paper presents an automatic speech recognition based unsupervised adaptation method for Hidden Markov Model (HMM) speech synthesis and its quality evalu...

متن کامل

Explorer Unsupervised cross - lingual speaker adaptation for HMM - based speech synthesis

In the EMIME project, we are developing a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrate two techniques, unsupervised adaptation for HMM-based TTS using a wordbased large-vocabulary continuous speech recognizer...

متن کامل

Unsupervised onset detection: A probabilistic approach using ICA and a hidden Markov classifier

We describe an onset detection system that takes a twostage approach, both of which are based on unsupervised learning in a probabilistic model. The first stage uses independent component analysis (ICA) to fit a short-term non-Gaussian model to frames of audio data. This model is used to generate a reduced signal to be interpreted as the ‘surprisingness’ of the original audio signal. Our hypoth...

متن کامل

Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping

In the EMIME project, we developed a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrated two techniques into a single architecture: unsupervised adaptation for HMM-based TTS using word-based large-vocabulary contin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

An unsupervised approach to creating web audio contents-based HMM voices

نویسندگان

چکیده

منابع مشابه

An investigation of the impact of speech transcript errors on HMM voices

Some Aspects of ASR Transcription Based Unsupervised Speaker Adaptation for HMM Speech Synthesis

Explorer Unsupervised cross - lingual speaker adaptation for HMM - based speech synthesis

Unsupervised onset detection: A probabilistic approach using ICA and a hidden Markov classifier

Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping

عنوان ژورنال:

اشتراک گذاری